Using 5 ms segments in concatenative speech synthesis
نویسندگان
چکیده
A concatenative speech synthesis system increases its potential to generate natural speech if the system uses more short speech segments, since the concatenation variation becomes greater. In this paper, we propose the use of very short speech segments (5 ms, one pitch period of 200 Hz pitch) for concatenative speech synthesis. The proposed method is applied to the speech database CMU ARCTIC, and 100 sentences synthesized. Though the synthesized speech maintains the speaker’s identity and is natural enough, it also has some noises caused by inappropriate unit selection, and the formant changes are awkward in some vowel regions.
منابع مشابه
Utilization of an HMM-based feature generation module in 5 ms segment concatenative speech synthesis
, – Spectrum at each segment boundary for calculation of concatenation cost (2) Synthesis stage – Text-to-Feature •Generate features from input text (linguistic/prosodic-information) – Feature-to-Speech • Find the N-best candidates in each frame (preselection) according to segment's target cost • Find the best path from the N-best candidates based on concatenation cost •Concatenate the segments...
متن کاملSynthesis Units for Conversational Speech - Using Phrasal Segments -
This paper describes the use of phrase-sized segments for the concatenative synthesis of conversational speech and discusses the differences in selection criteria that become necessary when the source corpus contains several years of conversational speech samples. It claims that naturalsounding conversational speech can be reproduced by use of such phrase-sized chunks for concatenation, and tha...
متن کاملDatabases of Heterogeneous Segments for Concatenative Speech Synthesis
Heterogeneous segments can enhance the quality of concatenative speech synthesis especially for highly inflected languages. In this paper we present a brief analysis of the segment types on a general level and discuss the problems related to optimising databases of heterogeneous segments. We present a brief discussion of the algorithmical complexity for the proposed approach and offer some heur...
متن کاملOn the Detection of Discontinuities in Concatenative Speech Synthesis
Last decade considerable work has been done in finding an objective distance measure which is able to predict audible discontinuities in concatenative speech synthesis. Speech segments in concatenative synthesis are extracted from disjoint phonetic contexts and discontinuities in spectral shape and phase mismatches tend to occur at unit boundaries. Many feature sets —most of them of spectral na...
متن کاملIRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES A Hybrid Text-to-Speech System that Combines Concatenative and Statistical Synthesis Units
Concatenative synthesis and statistical synthesis are the two main approaches to text-to-speech (TTS) synthesis. Concatenative TTS (CTTS) stores natural speech features segments, selected from a recorded speech database. Consequently, CTTS systems enable speech synthesis with natural quality. However, as the footprint of the stored data is reduced, desired segments are not always available in t...
متن کامل